Search CORE

238 research outputs found

Object Detection in Videos with Tubelet Proposal Networks

Author: Kang Kai
Li Hongsheng
Liu Xihui
Ouyang Wanli
Wang Xiaogang
Xiao Tong
Yan Junjie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/04/2017
Field of study

Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset. Different from object detection in static images, temporal information in videos is vital for object detection. To fully utilize temporal information, state-of-the-art methods are based on spatiotemporal tubelets, which are essentially sequences of associated bounding boxes across time. However, the existing methods have major limitations in generating tubelets in terms of quality and efficiency. Motion-based methods are able to obtain dense tubelets efficiently, but the lengths are generally only several frames, which is not optimal for incorporating long-term temporal information. Appearance-based methods, usually involving generic object tracking, could generate long tubelets, but are usually computationally expensive. In this work, we propose a framework for object detection in videos, which consists of a novel tubelet proposal network to efficiently generate spatiotemporal proposals, and a Long Short-term Memory (LSTM) network that incorporates temporal information from tubelet proposals for achieving high object detection accuracy in videos. Experiments on the large-scale ImageNet VID dataset demonstrate the effectiveness of the proposed framework for object detection in videos.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Author: Huang Kaiyi
Li Zhenguo
Liu Xihui
Sun Kaiyue
Xie Enze
Publication venue
Publication date: 12/07/2023
Field of study

Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation. We introduce a new approach, Generative mOdel fine-tuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach. Project page is available at https://karine-h.github.io/T2I-CompBench/.Comment: Project page: https://karine-h.github.io/T2I-CompBench

arXiv.org e-Print Archive

SAM3D: Segment Anything in 3D Scenes

Author: He Tong
Liu Xihui
Wu Xiaoyang
Yang Yunhan
Zhao Hengshuang
Publication venue
Publication date: 06/06/2023
Field of study

In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning. For a point cloud of a 3D scene with posed RGB images, we first predict segmentation masks of RGB images with SAM, and then project the 2D masks into the 3D points. Later, we merge the 3D masks iteratively with a bottom-up merging approach. At each step, we merge the point cloud masks of two adjacent frames with the bidirectional merging approach. In this way, the 3D masks predicted from different frames are gradually merged into the 3D masks of the whole 3D scene. Finally, we can optionally ensemble the result from our SAM3D with the over-segmentation results based on the geometric information of the 3D scenes. Our approach is experimented with ScanNet dataset and qualitative results demonstrate that our SAM3D achieves reasonable and fine-grained 3D segmentation results without any training or finetuning of SAM.Comment: Technical Report. The code is released at https://github.com/Pointcept/SegmentAnything3

arXiv.org e-Print Archive

RhoA phosphorylation mediated by Rho/RhoA-associated kinase pathway improves the anti-freezing potentiality of murine hatched and diapaused blastocysts

Author: Gu Meichao
Guo Yong
Hemin Ni
Liu Yunhai
Pauciullo Alfredo
Sheng Xihui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Institutional Research Information System University of Turin

Drag-A-Video: Non-rigid Video Editing with Point-based Interaction

Author: Han Haoyu
Li Zhenguo
Liu Xihui
Teng Yao
Wu Yue
Xie Enze
Publication venue
Publication date: 05/12/2023
Field of study

Video editing is a challenging task that requires manipulating videos on both the spatial and temporal dimensions. Existing methods for video editing mainly focus on changing the appearance or style of the objects in the video, while keeping their structures unchanged. However, there is no existing method that allows users to interactively ``drag'' any points of instances on the first frame to precisely reach the target points with other frames consistently deformed. In this paper, we propose a new diffusion-based method for interactive point-based video manipulation, called Drag-A-Video. Our method allows users to click pairs of handle points and target points as well as masks on the first frame of an input video. Then, our method transforms the inputs into point sets and propagates these sets across frames. To precisely modify the contents of the video, we employ a new video-level motion supervision to update the features of the video and introduce the latent offsets to achieve this update at multiple denoising timesteps. We propose a temporal-consistent point tracking module to coordinate the movement of the points in the handle point sets. We demonstrate the effectiveness and flexibility of our method on various videos. The website of our work is available here: https://drag-a-video.github.io/

arXiv.org e-Print Archive

A Novel Microspheres Formulation of Puerarin: Pharmacokinetics Study and In Vivo Pharmacodynamics Evaluations

Author: Changli Wang
Hui Deng
Linjuan Dong
Shiyu Liu
Xiao Song
Xihui Bai
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref